AI based Two-Stream CNN-based Hand Gesture Recognition System for PowerPoint Control

Authors: Karthikeyan S, Sowmiya S

DOI Link: https://doi.org/10.22214/ijraset.2025.72052

Abstract

In today\'s digital world, using a slideshow for presentations is a great way to share information & impress people. Speakers can control their slides using devices like a mouse, keyboard, or even a laser pointer. However, traditional devices, like keyboards or remotes, don\'t always make things easy. Sometimes, presenters need to be close to the screen or interact directly, which can identify the hand and its landmarks using a hand detection algorithm. That’s where hand gesture recognition technology comes into the picture. This cool tech helps with smoother & more interactive presentations. This project introduces a system that uses hand gestures to control PowerPoint. It works with a special model called a Two-Stream Convolutional Neural Network (CNN). The goal is to create an easy and quick way to manage Microsoft PowerPoint presentations just by moving hands. The Two-Stream CNN looks at both the still images and the movement of hands. The first stream focuses on fixed hand positions, spotting important points & how they relate in each gesture. Keywords: Hand gesture recognition, hand detection algorithm, presentation control, keyboards or remotes,

Introduction

The text describes a real-time hand gesture recognition system designed to control Microsoft PowerPoint presentations without physical input devices. The system uses a Two-Stream Convolutional Neural Network (CNN) to analyze both spatial and temporal features of hand movements for accurate gesture recognition.

Key Points:

Gestures are non-verbal bodily actions conveying messages, often culture-specific and used alongside speech.
The system is implemented as a PowerPoint Controller Web App using Python, Flask, TensorFlow, and other libraries for gesture recognition, data processing, and a responsive UI.
Features include secure user/admin login, uploading datasets and presentations, training the GestureNet CNN model, and real-time presentation control.
The GestureNet model is trained on the large HaGRID dataset (over 550,000 labeled images) with preprocessing steps like resizing, grayscale conversion, noise filtering, binarization, and segmentation.
The CNN architecture extracts features through convolutional, activation, and pooling layers, then classifies gestures (e.g., Next Slide, Previous Slide) using fully connected layers with softmax activation.
Real-time recognition captures live hand video, preprocesses frames, and applies the Two-Stream Network to analyze spatial (static) and temporal (motion) features simultaneously.
Recognized gestures are interpreted and mapped to corresponding PowerPoint commands, enabling seamless slide navigation and presentation control.
The system’s performance is evaluated using metrics such as confusion matrix, accuracy, precision, recall, and F1-score to ensure reliable recognition.

Conclusion

The project concludes with the successful development and implementation of a comprehensive solution for controlling PowerPoint presentations using hand gestures. By integrating various technologies and methodologies, including web development frameworks, deep learning techniques, and real-time video processing, the system offers a sophisticated yet user-friendly platform for enhancing presentation control. Throughout the project, feasibility analysis played a crucial role in assessing the practicality and viability of the proposed solution. Technical feasibility was confirmed through the availability of suitable tools and technologies, such as Python, Flask, TensorFlow, and OpenCV, for implementing the required functionalities. Economic feasibility was established by evaluating the cost-effectiveness of developing and deploying the system compared to potential benefits and returns on investment

References

[1] K. S. Yadav, A. Monsley, K. Monsley and R. H. Laskar, \"Gesture objects detection and tracking for virtual text entry keyboard interface\", Multimedia Tools Appl., vol. 82, no. 4, pp. 5317-5342, Feb. 2023. [2] J. Gangrade and J. Bharti, \"Vision-based hand gesture recognition for Indian sign language using convolution neural network\", IETE J. Res., vol. 69, no. 2, pp. 723-732, Feb. 2023. [3] L. Liu, W. Xu, Y. Ni, Z. Xu, B. Cui, J. Liu, et al., \"Stretchable neuromorphic transistor that combines multisensing and information processing for epidermal gesture recognition\", ACS Nano, vol. 16, no. 2, pp. 2282-2291, Jan. 2022. [4] J. P. Sahoo, A. J. Prakash, P. P?awiak and S. Samantray, \"Real-time hand gesture recognition using fine-tuned convolutional neural network\", Sensors, vol. 22, no. 3, pp. 706, Jan. 2022.

Copyright

Copyright © 2025 Karthikeyan S, Sowmiya S. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET72052

Publish Date : 2025-06-03

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here